{ggstatsplot}: Informative Statistical Visualizations

Indrajeet Patil

Why {ggstatsplot}?

Current CRAN package count >23,000




ggstatsplot provides

📊 information-rich plots with statistical details

📝 suitable for faster (exploratory) data analysis and reporting

Informative graphic = a thousand words

Graphical summaries can reveal problems not visible from numerical statistics.

Ready-made plot = no customization

The grammar of graphics is a powerful framework (Wilkinson, 2011) and can help you make any graphics fitting your specific data visualization needs! But…

Quality of Life (QoL) improvements with ggstatsplot

Provide ready-made plots with defaults following the best practices in statistical reporting and data visualization.

Simpler/faster data analysis workflow

In a typical exploratory data analysis workflow, data visualization and statistical modeling are two different phases: visualization informs modeling, and modeling can suggest a different visualization, and so on and so forth.

Central idea of ggstatsplot

Simple: combine these two phases into one!

And a LOT more!

…but we will come back to that later 📌

Let’s get started first!


Package available for installation on CRAN and GitHub:

Type Command
Release install.packages("ggstatsplot")
Development pak::pak("IndrajeetPatil/ggstatsplot")

Example function

ggbetweenstats()

For between-group comparisons

ggbetweenstats(
  data  = iris,
  x     = Species,
  y     = Sepal.Length,
  title = "Distribution of sepal length across Iris species"
)

Important

✏️ Defaults

  • raw data + distributions
  • descriptive statistics
  • inferential statistics
  • effect size + uncertainty
  • pairwise comparisons
  • Bayesian hypothesis-testing
  • Bayesian estimation

Statistical approaches available

  • parametric
  • parametric
  • robust
  • Bayesian

Other functions

Benefits for Statistical Reporting

Results in context of the data 🕵️

Standard approach

Pearson’s correlation test revealed that, across 142 participants, variable x was negatively correlated with variable y: \(t(140)=-0.76, p=.446\). The effect size \((r=-0.06, 95\% CI [-.23,.10])\) was small, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data were 5.81 times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis (absence of any correlation between x and y).

ggstatsplot approach

Toggling statistical approaches 🔀

Parametric

# anova
ggbetweenstats(
  data = mtcars,
  x = cyl,
  y = wt,
  type = "p" 
)

# correlation analysis
ggscatterstats(
  data = mtcars,
  x = wt,
  y = mpg,
  type = "p" 
)

# t-test
gghistostats(
  data = mtcars,
  x = wt,
  test.value = 2,
  type = "p" 
)

Non-parametric

# anova
ggbetweenstats(
  data = mtcars,
  x = cyl,
  y = wt,
  type = "np" 
)

# correlation analysis
ggscatterstats(
  data = mtcars,
  x = wt,
  y = mpg,
  type = "np" 
)

# t-test
gghistostats(
  data = mtcars,
  x = wt,
  test.value = 2,
  type = "np" 
)

Alternative: Pure Pain

Hunting for packages

📦 for inferential statistics ({stats})
📦 computing effect size + CIs (effectsize)
📦 for descriptive statistics (skimr)
📦 pairwise comparisons (multcomp)
📦 Bayesian hypothesis testing (BayesFactor)
📦 Bayesian estimation (bayestestR)
📦 …

Inconsistent APIs

🤔 accepts data frame, vector, matrix?
🤔 long/wide format data?
🤔 works with NAs?
🤔 returns data frame, vector, matrix?
🤔 works with tibbles?
🤔 has all necessary details?
🤔 …

Customizability

“What if I don’t like the default plots?” 🤔

Changing aesthetics 🎨

ggbetweenstats(
  data = movies_long,
  x = mpaa,
  y = rating,
  ggtheme = ggthemes::theme_economist(), 
  palette = "Darjeeling2", 
  package = "wesanderson" 
)

Aesthetic preferences not an excuse to avoid ggstatsplot! 😻 Any ggplot theme or palette can be used.

N.B. The default palette is colorblind-friendly.

Modification with {ggplot2} 🛠

You can modify ggstatsplot plots further using ggplot2 functions. 🎉

ggbetweenstats(
  data = mtcars,
  x = am,
  y = wt,
  type = "bayes"
) +
  scale_y_continuous(sec.axis = dup_axis()) 

Too much information 🙈

Get only plots:

ggbetweenstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  # turn off statistical analysis
  centrality.plotting = FALSE, 
  results.subtitle = FALSE, 
  bf.message = FALSE, 
  # turn off pairwise comparisons
  pairwise.display = "none" 
)

Get only expressions:

stats_expr <- ggpiestats(
  Titanic_full, Survived, Sex,
) %>% extract_subtitle()

ggiraphExtra::ggSpine( 
  data = Titanic_full,
  aes(x = Sex, fill = Survived)
) +
  labs(subtitle = stats_expr)  

Critical Evaluation

Things to be wary of

“Golem of Prague” issue

Promotes mindless application of statistical tests.

Easy-to-use software can lead to misuse.

Clunky API

  • Too many arguments to remember.
  • Not a “real” ggplot2 extension.
  • Limited number of functions.
  • Statistical proficiency needed.

Attractive Qualities

Things that will pull you in

Quality Assurance

Each commit must pass many QA checks:

CI Checks (GitHub Actions)

  • Unit tests (random-order)
  • Code coverage (100%)
  • Linting (0 lints)
  • Formatting (0 issues)
  • Documentation (website, link rot, examples)
  • CRAN checks (0 E, 0 W, 0 N)
  • Pre-commit hooks (0 issues)
  • Portability (Linux, macOS, Windows)
  • Robustness (dependencies, R versions)

Healthy and active code base

User Love

Total downloads > 500K (97 percentile)

library(packageRank)
plot(
  cranDownloads("ggstatsplot", from = "2018-04-03", to = Sys.Date()),
  graphics = "ggplot2", smooth = TRUE
)

Total citations > 1000

Conclusion

Benefits of the ggstatsplot approach

ggstatsplot, a package that combines data visualization and statistical analysis in a single step, is a powerful tool that:

  • provides ready-made plots with defaults that are information-rich
  • minimizes the chances of making errors in statistical reporting
  • follows best practices in data visualization and statistical reporting
  • highlights the importance of the effect by providing effect size measures by default
  • provides an easy way to evaluate absence of an effect using Bayesian framework
  • helps evaluate statistical analysis in the context of the underlying data
  • easy and simple enough that somebody with little coding experience can use it without making an error

For more

Source code for these slides can be found on GitHub.

If you are interested in good programming and software development practices, check out my other slide decks.

Find me at…

Twitter

LikedIn

GitHub

Website

E-mail

Thank You 😊

Session information

sessioninfo::session_info(include_base = TRUE)
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       macOS Sequoia 15.1
 system   aarch64, darwin20
 hostname MacBookAir.fritz.box
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Berlin
 date     2024-11-10
 pandoc   3.5 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.6.33 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package          * version    date (UTC) lib source
 base             * 4.4.2      2024-11-01 [2] local
 BayesFactor        0.9.12-4.7 2024-01-24 [2] CRAN (R 4.4.0)
 bayestestR         0.15.0     2024-10-17 [1] CRAN (R 4.4.1)
 bitops             1.0-8      2024-07-29 [2] CRAN (R 4.4.1)
 BWStest            0.2.3      2023-10-10 [2] CRAN (R 4.4.0)
 cachem             1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
 cli                3.6.3      2024-06-21 [1] CRAN (R 4.4.0)
 coda               0.19-4.1   2024-01-31 [2] CRAN (R 4.4.0)
 codetools          0.2-20     2024-03-31 [2] CRAN (R 4.4.2)
 colorspace         2.1-1      2024-07-26 [2] CRAN (R 4.4.0)
 compiler           4.4.2      2024-11-01 [2] local
 correlation        0.8.6      2024-10-26 [1] CRAN (R 4.4.1)
 cranlogs           2.1.1      2019-04-29 [2] RSPM (R 4.4.0)
 curl               6.0.0      2024-11-05 [1] CRAN (R 4.4.1)
 data.table         1.16.0     2024-08-27 [2] CRAN (R 4.4.1)
 datasets         * 4.4.2      2024-11-01 [2] local
 datawizard         0.13.0     2024-10-05 [1] CRAN (R 4.4.1)
 digest             0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
 dplyr              1.1.4      2023-11-17 [2] CRAN (R 4.4.0)
 effectsize         0.8.9      2024-07-03 [1] CRAN (R 4.4.0)
 emmeans            1.10.4     2024-08-21 [2] CRAN (R 4.4.1)
 estimability       1.5.1      2024-05-12 [2] RSPM (R 4.4.0)
 evaluate           1.0.1      2024-10-10 [1] CRAN (R 4.4.1)
 fansi              1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
 farver             2.1.2      2024-05-13 [2] RSPM (R 4.4.0)
 fastmap            1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
 fortunes           1.5-4      2016-12-29 [2] RSPM (R 4.4.0)
 generics           0.1.3      2022-07-05 [2] CRAN (R 4.4.0)
 ggiraph            0.8.10     2024-05-17 [2] RSPM (R 4.4.0)
 ggiraphExtra       0.3.0      2020-10-06 [2] RSPM (R 4.4.0)
 ggplot2          * 3.5.1      2024-04-23 [2] CRAN (R 4.4.0)
 ggrepel            0.9.6      2024-09-07 [2] CRAN (R 4.4.1)
 ggsignif           0.6.4      2022-10-13 [2] CRAN (R 4.4.0)
 ggstatsplot      * 0.12.5     2024-11-01 [1] CRAN (R 4.4.1)
 ggthemes           5.1.0      2024-02-10 [2] CRAN (R 4.4.0)
 glue               1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
 gmp                0.7-5      2024-08-23 [2] CRAN (R 4.4.1)
 graphics         * 4.4.2      2024-11-01 [2] local
 grDevices        * 4.4.2      2024-11-01 [2] local
 grid               4.4.2      2024-11-01 [2] local
 gtable             0.3.5      2024-04-22 [2] CRAN (R 4.4.0)
 htmltools          0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
 htmlwidgets        1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
 httr               1.4.7      2023-08-15 [2] RSPM
 insight            0.20.5     2024-10-02 [1] CRAN (R 4.4.1)
 jsonlite           1.8.9      2024-09-20 [1] CRAN (R 4.4.1)
 knitr              1.49       2024-11-08 [1] CRAN (R 4.4.2)
 kSamples           1.2-10     2023-10-07 [2] CRAN (R 4.4.0)
 labeling           0.4.3      2023-08-29 [2] CRAN (R 4.4.0)
 lattice            0.22-6     2024-03-20 [2] CRAN (R 4.4.2)
 lifecycle          1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
 lubridate          1.9.3      2023-09-27 [2] RSPM
 magrittr           2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
 MASS               7.3-61     2024-06-13 [2] CRAN (R 4.4.2)
 Matrix             1.7-1      2024-10-18 [2] CRAN (R 4.4.2)
 MatrixModels       0.5-3      2023-11-06 [2] RSPM
 memoise            2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
 methods          * 4.4.2      2024-11-01 [2] local
 mgcv               1.9-1      2023-12-21 [2] CRAN (R 4.4.2)
 multcomp           1.4-26     2024-07-18 [2] CRAN (R 4.4.0)
 multcompView       0.1-10     2024-03-08 [2] RSPM
 munsell            0.5.1      2024-04-01 [2] CRAN (R 4.4.0)
 mvtnorm            1.3-1      2024-09-03 [2] CRAN (R 4.4.1)
 mycor              0.1.1      2018-04-10 [2] RSPM (R 4.4.0)
 nlme               3.1-166    2024-08-14 [2] CRAN (R 4.4.2)
 packageRank      * 0.9.2      2024-08-01 [2] CRAN (R 4.4.0)
 paletteer          1.6.0      2024-01-21 [2] CRAN (R 4.4.0)
 parallel           4.4.2      2024-11-01 [2] local
 parameters         0.23.0     2024-10-18 [1] CRAN (R 4.4.1)
 patchwork          1.3.0      2024-09-16 [1] CRAN (R 4.4.1)
 pbapply            1.7-2      2023-06-27 [2] CRAN (R 4.4.0)
 performance        0.12.4     2024-10-18 [1] CRAN (R 4.4.1)
 pillar             1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
 pkgconfig          2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
 pkgsearch          3.1.3      2023-12-10 [2] RSPM (R 4.4.0)
 plyr               1.8.9      2023-10-02 [2] CRAN (R 4.4.0)
 PMCMRplus          1.9.12     2024-09-08 [1] CRAN (R 4.4.1)
 ppcor              1.1        2015-12-03 [2] RSPM (R 4.4.0)
 prismatic          1.1.2      2024-04-10 [2] RSPM
 purrr              1.0.2      2023-08-10 [1] CRAN (R 4.4.0)
 R.methodsS3        1.8.2      2022-06-13 [1] CRAN (R 4.4.0)
 R.oo               1.27.0     2024-11-01 [1] CRAN (R 4.4.1)
 R.utils            2.12.3     2023-11-18 [1] CRAN (R 4.4.0)
 R6                 2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
 RColorBrewer       1.1-3      2022-04-03 [2] CRAN (R 4.4.0)
 Rcpp               1.0.13-1   2024-11-02 [1] CRAN (R 4.4.1)
 RcppParallel       5.1.9      2024-08-19 [2] CRAN (R 4.4.1)
 RCurl              1.98-1.16  2024-07-11 [2] CRAN (R 4.4.0)
 rematch2           2.1.2      2020-05-01 [2] RSPM
 reshape2           1.4.4      2020-04-09 [2] RSPM
 rlang              1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown          2.29       2024-11-04 [1] CRAN (R 4.4.1)
 Rmpfr              0.9-5      2024-01-21 [2] RSPM
 rstantools         2.4.0      2024-01-31 [2] RSPM
 rstudioapi         0.17.1     2024-10-22 [1] CRAN (R 4.4.1)
 sandwich           3.1-0      2023-12-11 [2] CRAN (R 4.4.0)
 scales             1.3.0      2023-11-28 [2] CRAN (R 4.4.0)
 sessioninfo        1.2.2.9000 2024-11-09 [1] Github (r-lib/sessioninfo@37c81af)
 sjlabelled         1.2.0      2022-04-10 [2] RSPM (R 4.4.0)
 sjmisc             2.8.10     2024-05-13 [2] RSPM (R 4.4.0)
 snakecase          0.11.1     2023-08-27 [2] CRAN (R 4.4.0)
 splines            4.4.2      2024-11-01 [2] local
 stats            * 4.4.2      2024-11-01 [2] local
 statsExpressions   1.6.1      2024-10-31 [1] CRAN (R 4.4.1)
 stringi            1.8.4      2024-05-06 [2] RSPM (R 4.4.0)
 stringr            1.5.1      2023-11-14 [2] RSPM
 sugrrants          0.2.9      2024-03-12 [2] RSPM (R 4.4.0)
 SuppDists          1.1-9.8    2024-09-03 [2] CRAN (R 4.4.1)
 survival           3.7-0      2024-06-05 [2] CRAN (R 4.4.2)
 systemfonts        1.1.0      2024-05-15 [1] CRAN (R 4.4.0)
 TH.data            1.1-2      2023-04-17 [2] CRAN (R 4.4.0)
 tibble             3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
 tidyr              1.3.1      2024-01-24 [2] CRAN (R 4.4.0)
 tidyselect         1.2.1      2024-03-11 [2] CRAN (R 4.4.0)
 timechange         0.3.0      2024-01-18 [2] RSPM
 tools              4.4.2      2024-11-01 [2] local
 utf8               1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
 utils            * 4.4.2      2024-11-01 [2] local
 uuid               1.2-1      2024-07-29 [2] CRAN (R 4.4.1)
 vctrs              0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
 withr              3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
 xfun               0.49       2024-10-31 [1] CRAN (R 4.4.1)
 xtable             1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
 yaml               2.3.10     2024-07-26 [1] CRAN (R 4.4.1)
 zeallot            0.1.0      2018-01-28 [2] RSPM
 zoo                1.8-12     2023-04-13 [2] CRAN (R 4.4.0)

 [1] /Users/indrajeetpatil/Library/R/arm64/4.4/library
 [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────

Appendix

Examples of other functions

ggwithinstats()

Hypothesis about group differences: repeated measures design

ggwithinstats(
  data = WRS2::WineTasting,
  x = Wine,
  y = Taste
)

Important

✏️ Defaults

  • raw data + distributions
  • descriptive statistics
  • inferential statistics
  • effect size + uncertainty
  • pairwise comparisons
  • Bayesian hypothesis-testing
  • Bayesian estimation

Statistical approaches available

  • parametric
  • parametric
  • robust
  • Bayesian

gghistostats()

Distribution of a numeric variable

gghistostats(
  data = movies_long,
  x = budget,
  test.value = 30 
)

Important

✏️ Defaults

  • counts + proportion for bins
  • descriptive statistics
  • inferential statistics
  • effect size + uncertainty
  • pairwise comparisons
  • Bayesian hypothesis-testing
  • Bayesian estimation

Statistical approaches available

  • parametric
  • parametric
  • robust
  • Bayesian

ggdotplotstats()

Labeled numeric variable

ggdotplotstats(
  data = movies_long,
  x = budget,
  y = genre,
  test.value = 30 
)

Important

✏️ Defaults

  • descriptive statistics
  • inferential statistics
  • effect size + uncertainty
  • pairwise comparisons
  • Bayesian hypothesis-testing
  • Bayesian estimation

Statistical approaches available

  • parametric
  • parametric
  • robust
  • Bayesian

ggscatterstats()

Hypothesis about correlation: Two numeric variables

ggscatterstats(
  data = movies_long,
  x = budget,
  y = rating
)

Important

✏️ Defaults

  • joint distribution
  • marginal distribution
  • effect size + uncertainty
  • pairwise comparisons
  • Bayesian hypothesis-testing
  • Bayesian estimation

Statistical approaches available

  • parametric
  • parametric
  • robust
  • Bayesian

ggcorrmat()

Hypothesis about correlation: Multiple numeric variables

ggcorrmat(dplyr::starwars)

Important

✏️ Defaults

  • inferential statistics
  • effect size + uncertainty
  • careful handling of NAs
  • partial correlations

Statistical approaches available

  • parametric
  • parametric
  • robust
  • Bayesian

ggpiestats()

Hypothesis about composition of categorical variables

ggpiestats(
  data = mtcars,
  x = am,
  y = cyl
)

Important

✏️ Defaults

  • descriptive statistics
  • inferential statistics
  • effect size + uncertainty
  • goodness-of-fit tests
  • Bayesian hypothesis-testing
  • Bayesian estimation

ggbarstats()

Hypothesis about composition of categorical variables

ggbarstats(
  data = mtcars,
  x = am,
  y = cyl
)

Important

✏️ Defaults

  • descriptive statistics
  • inferential statistics
  • effect size + uncertainty
  • goodness-of-fit tests
  • Bayesian hypothesis-testing
  • Bayesian estimation

ggcoefstats()

Hypothesis about regression coefficients

mod <- lm(
  formula = rating ~ mpaa,
  data = movies_long
)

ggcoefstats(mod)

Important

✏️ Defaults

  • estimate + uncertainty
  • inferential statistics (\(t\), \(z\), \(F\), \(\chi^2\))
  • model fit indices (AIC + BIC)

Supports all regression models supported in {easystats} ecosystem.

Meta-analysis is also supported!

grouped_ variants

Iterating over a grouping variable

grouped_ functions

grouped_ggpiestats(
  data = mtcars,
  x = cyl,
  grouping.var = am 
)

Available grouped_ variants:

  • grouped_ggbetweenstats()
  • grouped_ggwithinstats()
  • grouped_gghistostats()
  • grouped_ggdotplotstats()
  • grouped_ggscatterstats()
  • grouped_ggcorrmat()
  • grouped_ggpiestats()
  • grouped_ggbarstats()

More {ggstatsplot} benefits

Supports different statistical approaches

Note

Functions Description Parametric Non-parametric Robust Bayesian
ggbetweenstats() Between group comparisons
ggwithinstats() Within group comparisons
gghistostats(), ggdotplotstats() Distribution of a numeric variable
ggcorrmat() Correlation matrix
ggscatterstats() Correlation between two variables
ggpiestats(), ggbarstats() Association between categorical variables NA NA
ggpiestats(), ggbarstats() Equal proportions for categorical variable levels NA NA
ggcoefstats() Regression modeling
ggcoefstats() Random-effects meta-analysis NA

Best practices in statistical reporting 🏆

Avoiding reporting errors

“half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion”

(Nuijten et al., Behavior Research Methods, 2016)

Since the plot and the statistical analysis are yoked together, the chances of making an error in reporting the results are minimized.

No need to worry about updating figures and statistical details separately. 🔗

Making sense of null results

\(p > 0.05\): The null hypothesis (H0) can’t be rejected

But can it be accepted?! Null Hypothesis Significance Testing 🤫

“In 72% of cases, nonsignificant results were misinterpreted, in that the authors inferred that the effect was absent. A Bayesian reanalysis revealed that fewer than 5% of the nonsignificant findings provided strong evidence (i.e., \(BF_{01} > 10\)) in favor of the null hypothesis over the alternative hypothesis.”

(Aczel et al., AMPPS, 2018)

Juxtaposing frequentist and Bayesian statistics for the same analysis helps to properly interpret the null results.

A few other benefits

Minimal code needed (data, x, y): minimizes chances of error + tidy scripts. 💅

Disembodied figures stand on their own and are easy to evaluate. 🧐

More breathing room for theoretical discussion and other text. ✍

Misconceptions: This package is…


❌ an alternative to learning ggplot2
✅ the more you know ggplot2, the better you can modify the defaults to your liking)

❌ meant to be used in talks/presentations
✅ defaults too complicated for effectively communicating results in time-constrained presentation settings, e.g. conference talks)

❌ only relevant when used in publications
✅ not necessary; can also be useful only during exploratory phase

❌ the only game in town
✅ excellent GUI open-source software: JASP and jamovi)